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ABSTRACT:  Cytochrome  P450  (CYP)  3A4,  2D6,  2C9,  2C19,  and  IA2  are  the  most 
important  drug-metabolizing  enzymes  in  the  human  liver.  Knowledge  of  which  parts  of  a  drug 
molecule  are  subject  to  metabolic  reactions  catalyzed  by  these  enzymes  is  crucial  for  rational 
drug  design  to  mitigate  ADME/toxicity  issues.  SMARTCyp,  a  recently  developed  2D  ligand 
structure-based  method,  is  able  to  predict  site-specific  metabolic  reactivity  of  CYP3A4  and 
CYP2D6  substrates  with  an  accuracy  that  rivals  the  best  and  more  computationally  demanding 
3D  structure-based  methods.  In  this  article,  the  SMARTCyp  approach  was  extended  to  predict 
the  metabolic  hotspots  for  CYP2C9,  CYP2C19,  and  CYP1A2  substrates.  This  was 
accomplished  by  taking  into  account  the  impact  of  a  key  substrate-receptor  recognition 
feature  of  each  enzyme  as  a  correction  term  to  the  SMARTCyp  reactivity.  The  corrected 
reactivity  was  then  used  to  rank  order  the  likely  sites  of  CYP-mediated  metabolic  reactions. 

For  60  CYP1A2  substrates,  the  observed  major  sites  of  CYP1A2  catalyzed  metabolic  reactions 
were  among  the  top-ranked  1,  2,  and  3  positions  in  67%,  80%,  and  83%  of  the  cases, 
respectively.  The  results  were  similar  to  those  obtained  by  MetaSite  and  the  reactivity  +  docking  approach.  For  70  CYP2C9 
substrates,  the  observed  sites  of  CYP2C9  metabolism  were  among  the  top-ranked  1,  2,  and  3  positions  in  66%,  86%,  and  87%  of 
the  cases,  respectively.  These  results  were  better  than  the  corresponding  results  of  StarDrop  version  5.0,  which  were  61%,  73%, 
and  77%,  respectively.  For  36  compounds  metabolized  by  CYP2C19,  the  observed  sites  of  metabolism  were  found  to  be  among 
the  top-ranked  1,  2,  and  3  sites  in  78%,  89%,  and  94%  of  the  cases,  respectively.  The  computational  procedure  was  implemented 
as  an  extension  to  the  program  SMARTCyp  2.0.  With  the  extension,  the  program  can  now  predict  the  site  of  metabolism  for  all 
five  major  drug-metabolizing  enzymes  with  an  accuracy  similar  to  or  better  than  that  achieved  by  the  best  3D  structure-based 
methods.  Both  the  Java  source  code  and  the  binary  executable  of  the  program  are  freely  available  to  interested  users. 


1.  INTRODUCTION 

The  cytochrome  P450  superfamily  of  enzymes  (abbreviated  as 
CYP)  comprises  a  large  and  diverse  group  of  proteins  with  the 
heme  cofactor.1  In  humans,  they  transform  lipophilic  drugs  to 
more  polar  compounds  that  can  be  excreted  by  the  kidneys 
and,  therefore,  play  important  roles  in  defining  a  drug’s 
pharmacokinetic  profile.2  They  also  contribute  to  drug— drug 
interactions  and  metabolism-dependent  toxicity  issues.3  Among 
the  CYP  enzymes,  CYP3A4,  2D6,  2C9,  2C19,  and  1A2  are  the 
most  important  members  in  human  liver,  the  principal  organ 
for  phase  1  metabolism  and  clearance  of  drugs  and  other 
chemicals.4  Together,  the  five  enzymes  metabolize  approx¬ 
imately  90%  of  marketed  drugs.5-7  The  ability  to  predict 
substrate  specific  sites  of  metabolism  (SOM)  of  these  enzymes 
is  essential  for  rational  drug  design  to  mitigate  ADME  and 
toxicity  issues  of  a  drug  candidate. 

The  mechanism  of  CYP-mediated  metabolism  is  complex 
and  consists  of  multiple-steps.8  However,  there  is  experimental 
evidence  indicating  that  the  rate-determining  step  at  least 
partially  involves  hydrogen  or  electron  abstraction  from  the 
substrate  followed  by  oxygen  rebound  or  a  concerted 
oxygenation  via  formation  of  a  sigma  complex  between  the 
substrate  and  the  Fe03+  complex.8  Chemical  reactivity, 
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therefore,  is  an  important  determinant  for  the  SOM  of  a 
substrate.  On  the  other  hand,  due  to  the  different  shapes  and 
sizes  of  substrate  binding  cavities  of  the  CYP  enzymes  and  their 
characteristic  substrate  recognition  features,  substrate  exposure 
to  the  catalytic  moiety  may  be  restricted.  As  a  result,  the  most 
reactive  site  of  a  substrate  may  not  be  the  observed  SOM.  This 
underscores  the  necessity  of  considering  substrate-receptor 
recognition  in  predicting  SOM. 

In  principle,  a  promising  method  for  predicting  substrate 
accessibility  to  the  CYP  catalytic  moiety  is  docking  simulation. 
Many  docking  studies  aimed  at  predicting  substrate  sites  of 
metabolism  have  been  published  in  recent  years.9  12  Major 
challenges  for  this  approach  include  difficulties  associated  with 
proper  representation  of  a  protein  and  proper  accounting  of  its 
flexibility,  the  lack  of  a  score  function  that  does  not  require 
tuning  by  a  user  with  a  priori  knowledge  for  ranking  the  docked 
poses,  and  insufficient  force  field  parameters  to  describe 
interactions  between  the  substrate  and  FeO',+  complex.13 
Recently,  Moors  et  al.  demonstrated  an  approach  to  account 
for  protein  flexibility  in  a  CYP2D6  docking  study.14  They 
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generated  an  ensemble  of  1,000  CYP2D6  conformations 
starting  with  the  X-ray  structure  of  apo-CYP2D6.  Docking 
simulations  were  performed  on  every  conformer  of  the 
ensemble  for  many  CYP2D6  substrates.  When  information 
from  the  docking  simulations  was  combined  with  the  estimated 
site  reactivity  of  the  substrates,  reliable  SOM  predictions  were 
achieved.  However,  even  with  the  availability  of  today’s 
massively  parallel  computers,  docking  simulations  using  an 
ensemble  of  thousands  of  protein  conformations  and 
processing  a  huge  number  of  docked  poses  are  still  time- 
consuming  and  not  practical  for  virtual  screening  of  a  large 
number  of  compounds. 

Moors  and  co-workers  also  showed  that  predictions  based  on 
docking  to  any  single  protein  conformation  are  significantly  less 
reliable  as  judged  by  the  scores  of  receiver  operating 
characteristic  (ROC)  curves  of  their  predictions.14  This  is  in 
agreement  with  results  of  a  recent  study  by  Afzelius  et  al.  who 
evaluated  SOM  predictions  for  CYP3A4  and  CYP2C9  based  on 
docking  using  Dock  and  Glide  software.15  For  some  structurally 
diverse  CYP2C9  and  CYP3A4  substrates,  they  found  that  the 
top-ranked  sites,  based  on  docking  to  a  single  protein 
conformation,  were  among  the  observed  SOM  only  30%  to 
40%  of  the  time.  This  accuracy  is  lower  than  predictions  given 
by  an  experienced  biotransformation  scientist,  which  were 
approximately  50%.15  Furthermore,  it  is  significantly  lower  than 
predictions  based  on  C— H  bond  orders  of  the  substrates, 
calculated  by  density  functional  theory  (B3LYP)  with  a  small  3- 
21G  basis  set,  which  achieved  around  60%  accuracy.15 

Chemical  reactivity  is  the  basis  of  SOM  predictions  of 
CYP3A4-mediated  metabolism  by  Singh  et  al.  who  estimated 
hydrogen  abstraction  energies  by  AMI  molecular  orbital 
calculations  and  predicted  SOM  by  the  energies  and  surface 
area  of  the  hydrogen  atoms.16  AMI  estimation  of  substrate 
reactivity  is  also  the  basis  of  SOM  prediction  by  StarDrop,  a 
commercial  software  package,  which  makes  on-the-fly  AMI 
calculations.17  In  addition,  StarDrop  makes  corrections  to  the 
AMI  energies  to  account  for  steric  accessibility  and  orientation 
effects  via  models  trained  using  a  large  number  of  substrates  of 
CYP3A4,  2D6,  and  2C9,  the  top  three  major  drug-metabolizing 
P450  enzymes. 

Another  popular  commercial  package  for  SOM  prediction  is 
MetaSite.18  It  identifies  likely  sites  of  metabolism  by  fitness  of 
3D  structures  of  a  substrate  oriented  within  the  catalytic  sites  of 
the  CYP  enzymes  represented  by  GRID  molecular  interaction 
fields.19  For  a  large  number  of  CYP3A4  substrates,  Zhou  et  al. 
found  that  the  observed  SOM  were  among  the  three  top- 
ranked  sites  by  MetaSite  in  78%  of  the  cases  studied.20 
However,  the  success  rate  for  the  observed  SOM  to  be  ranked 
the  highest  is  significantly  lower.13 

Generally  speaking,  3D  molecular  structure-based  prediction 
methods,  such  as  docking  into  a  large  number  of  protein 
conformations  and/or  quantum  mechanical  calculation  of 
substrate  reactivity  at  appropriate  levels  of  theory,  are 
computationally  demanding  and  not  easily  applicable  to  virtual 
screening  of  a  large  number  of  compounds.  An  alternative  is  to 
estimate  the  reactivity  of  molecular  fragments  by  high-level 
quantum  mechanical  calculations  on  representative  molecules 
and  assign  reactivity  to  different  sites  of  a  substrate  by  matching 
structural  patterns.  This  is  the  approach  used  by  SMARTCyp,  a 
2D  substrate  structure-based  SOM  prediction  method.21  To 
predict  SOM  of  CYP3A4-mediated  reactions,  the  SMARTCyp 
energy  of  each  potential  site  is  corrected  by  an  accessibility 
descriptor.  The  corrected  energies  are  then  used  to  rank  order 


likely  sites  of  CYP3A4  catalyzed  reactions.  For  394  CYP3A4 
substrates,  the  observed  SOM  were  among  the  top-ranked  one, 
two,  and  three  sites  65%,  76%,  and  81%  of  the  time, 
respectively.  They  are  at  least  the  same  or  better  than  the 
performance  of  StarDrop21  and  MetaSite.22  To  predict 
CYP2D6  catalyzed  SOM,  SMARTCyp  introduced  two  2D 
descriptors  to  correct  the  energy  of  each  atom,  an  accessibility 
descriptor  and  a  descriptor  to  account  for  the  impact  of  the 
characteristic  receptor  —  ligand  recognition  between  CYP2D6 
and  a  positively  charged  ligand.23  When  the  approach  was 
applied  to  predict  the  SOM  of  a  large  number  of  CYP2D6 
substrates,  its  performance  was  shown  to  be  better  than 
available  commercial  packages.”3 

Encouraged  by  the  success  of  SMARTCyp,  we  wondered  if 
the  same  approach  could  be  applied  to  predict  the  SOM  of 
other  major  drug-metabolizing  enzymes,  namely  CYP1A2, 
CYP2C9,  and  CYP2C19.  In  this  article,  we  demonstrate  that 
taking  into  account  the  effect  of  a  single  key  receptor-substrate 
recognition  feature  for  each  of  the  enzymes,  the  SMARTCyp 
approach  can  be  extended  to  provide  accurate  predictions  of 
the  SOM  catalyzed  by  these  enzymes. 

2.  METHOD  AND  RESULTS 

2.1.  SMARTCyp  Score  Functions  for  Ranking  CYP3A4 
and  CYP2D6  SOM.  SMARTCyp  uses  the  following  score 
function  to  rank  likely  sites  of  CYP3A4  catalyzed  reactions 

Score_3A4  =  E  —  8*A(kJ/mol)  (l) 

In  this  equation,  E  is  an  estimate  of  the  activation  energy 
required  for  a  CYP-catalyzed  reaction  at  a  specific  atom  of  the 
substrate.  It  is  assigned  to  each  atom  of  a  substrate  by  matching 
SMARTS  patterns24  to  a  lookup  table  of  energies  derived  from 
density  functional  theory  calculations.  The  accessibility 
descriptor,  A,  is  defined  as  the  ratio  between  the  longest 
bond  path  from  a  given  atom  divided  by  the  longest  bond  path 
present  in  the  whole  molecule. 

The  SMARTCyp  score  function  for  ranking  likely  sites  of 
CYP2D6-mediated  metabolism  is 

Score_2D6  =  E  +  Span2End  correction  +  N+Dist  _correcti 

on  (2) 

where  N+Dist_correction  =  6.7  X  (8  -  N+Dist )  when  N+Dist  <  8, 
and  N+Dist_correction  =  0  when  N+Dist  >  8;  Span2End_cor- 
rection  =  6.7  X  Span2End  when  Span2End  <  4,  and 
Span2End_correction  =  6.7  X  4  +  0.01  X  Span2End  when 
Span2End  >  4. 

The  Span2End  descriptor  accounts  for  the  observation  that 
atoms  in  the  middle  of  a  compound  are  less  likely  to  be 
CYP2D6-mediated  SOM  than  atoms  at  the  ends  of  the 
molecule.  N+Dist  is  the  maximum  distance  in  number  of 
chemical  bonds  between  a  likely  SOM  to  a  protonated  nitrogen 
atom.  It  accounts  for  the  fact  that  the  Glu216  and/or  Asp301 
residues  of  CYP2D6  tend  to  form  a  characteristic  salt  bridge 
with  a  protonated  nitrogen  atom  in  the  substrate.25  Because  the 
Glu216  and  Asp301  residues  are  located  far  away  from  the 
catalytic  heme  moiety,  substrate  atoms  close  to  the  protonated 
nitrogen  atom  are  less  likely  to  be  CYP2D6-mediated  SOM. 
The  constant,  6.7  kj/mol,  and  the  threshold  values  of  N+Dist 
and  Span2End  were  determined  using  a  training  set  of  86 
CYP2D6  substrates. 

2.2.  SOM  Prediction  for  CYP1  A2-Catalyzed  Reactions. 

CYP1A2  constitutes  approximately  14%  of  human  liver  P450 
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enzymes  in  Caucasians26  and  about  18%  in  Asians.4  It  is 
responsible  for  the  clearance  of  about  5%  of  the  top  200  drugs 
on  the  U.S.  market.27  Its  catalytic  site  has  been  well 
characterized."8  31  The  X-ray  structure  of  the  protein 
cocrystallized  with  a-naphthoflavone  (ANF)  indicates  that  the 
substrate-binding  cavity  of  CYP1A2  is  narrow  and  lined  by 
amino  acid  residues  that  define  a  relatively  planar  substrate 
binding  platform.”2  ANF  is  a  potent  and  competitive  inhibitor 
of  CYPlA2-catalyzed  reactions.  Its  high  binding  affinity  to 
CYP1A2  was  attributed  to  the  overall  fitness  of  the  shape  and 
size  of  the  molecule  to  the  binding  site  cavity  and  the  resulting 
dense  and  extensive  van  der  Waals  interactions  with  the 
nonpolar  side  chains  of  the  protein.  Additionally,  it  was  noted 
that  there  is  a  water  molecule  close  to  the  carbonyl  group  of 
ANF  that  provides  an  extra  binding  interaction.  This  water 
molecule  is  the  only  one  present  in  the  active  site,  and  there 
appears  to  be  no  solvent  channels  that  connect  the  active  site 
cavity  with  the  protein  surface.  As  shown  in  Figure  1,  the  water 
molecule  appears  to  be  hydrogen-bonded  to  the  carbonyl  of 
ANF  as  well  as  to  the  carbonyl  of  Gly316.29 


Figure  1.  The  X-ray  crystal  structure  of  human  CYP1A2  cocrystallized 
with  a  potent  inhibitor,  a-naphthoflavone,  showing  tight  binding  due 
to  n—n  stacking  with  residue  Phe226  and  a  structured  water  molecule 
in  hydrogen  bonding  interactions  with  the  ligand  carbonyl  group  at 
one  end  and  with  the  carbonyl  group  of  Gly316  at  the  other  end.  The 
ligand-protein  interactions  orient  the  ligand  so  that  the  4'  position  of 
the  phenyl  ring  is  exposed  to  the  heme.  For  clarity,  other  parts  of  the 
protein  are  not  displayed. 


Rydberg  et  al.  showed  that  reliable  SOM  prediction  for 
CYP1A2  substrates  can  be  achieved  by  combining  site  reactivity 


derived  from  density  functional  theory  calculations  with 
docking  into  the  X-ray  crystal  structure  of  CYP1A2.3"  For  60 
CYP1A2  substrates,  the  accuracy  of  SOM  prediction  by  the 
reactivity  +  docking  approach  is  similar  to  that  of  MetaSite 
version  3.0,  as  shown  in  Table  1.  Both  the  docking  and 
MetaSite  calculations  require  3D  molecular  structure  informa¬ 
tion  and  evaluation  of  the  fitness  of  substrate  molecular  shape 
and  size  to  the  binding  site  cavity.  Encouraged  by  the  success  of 
SMARTCyp  for  CYP3A4  and  CYP2D6  SOM  predictions, 
which  use  2D  molecular  structure  information  of  the  substrates 
only,  we  examined  whether  prediction  accuracies  similar  to  that 
achieved  by  the  reactivity/ docking  method  and  MetaSite  could 
be  achieved  for  CYP1A2  by  the  SMARTCyp  approach. 

We  first  examined  the  performance  of  using  SMARTCyp 
reactivity  only,  without  any  substrate-CYPlA2  recognition 
information,  to  predict  SOM  of  CYP1A2  mediated  metabolic 
reactions.  We  examined  two  schemes  of  SMARTCyp  reactivity 
as  defined  by  eq  1  and  by  the  following  equation 

Score_2D6'  =  E  +  Span2End  correction  (3) 

Equation  1  is  the  scoring  function  for  CYP3A4-mediated 
SOM.  Equation  3  is  equal  to  the  scoring  function  for  CYP2D6- 
catalyzed  SOM  minus  the  N+Dist_correction  term.  The 
N+Dist_correction  term  was  excluded  in  eq  3  because  Rydberg 
et  al.  used  it  to  account  for  the  effect  of  CYP2D6-substrate 
recognition,  which  is  not  applicable  in  the  case  for  CYP1A2. 

Table  1  compares  the  performance  of  SOM  predictions  by 
Score_3A4  and  Score_2D6',  for  the  60  CYP1A2  substrates 
used  by  Rydberg  et  al.  in  their  recent  study,32  with  the 
performance  of  MetaSite  and  Rydberg’s  reactivity  +  docking 
approach.  Molecular  structures  of  the  60  CYP1A2  substrates, 
observed  SOM,  and  predicted  top  three  SOM  by  Score_3A4 
and  Score_2D6'  are  provided  in  Figure  SI  in  the  Supporting 
Information.  Compared  to  the  percentage  of  observed  SOM  in 
randomly  picked  sites  (column  random  in  Table  l),  both 
Score_3A4  and  Score_2D6’  performed  reasonably  well, 
considering  none  of  the  information  was  CYPlA2-specific. 
These  results  indicate  that  chemical  reactivity  is  the  most 
important  determinant  for  SOM  of  CYPlA2-catalyzed 
metabolic  reactions. 

Table  1  also  shows  that  Score_2D6'  outperformed 
Score_3A4  in  all  four  performance  criteria  (percentage  of  the 
observed  major  sites  of  metabolism  found  in  the  top-ranked  1, 
2,  and  3  sites,  as  well  as  the  percentage  of  observed  major  or 
minor  sites  of  metabolism  found  in  the  highest  ranked  sites).  It 
clearly  indicates  that  Score_2D6'  is  a  better  representation  of 
CYP1A2  substrate  site  reactivity  than  Score_3A4.  This  can  be 
rationalized  by  noting  that  the  CYP3A4  substrate  binding  cavity 
is  large  and  very  flexible,  while  the  CYP1A2  binding  site  cavity 
is  much  smaller  and  narrower,  and,  hence,  the  latter  is  more 
similar  to  the  substrate  binding  site  cavity  of  CYP2D6. 


Table  1.  Performance  Comparison  for  CYP1A2-Catalyzed  Site  of  Metabolism  (SOM)  Prediction  on  60  CYP1A2  Substrates'1 


Score  3A4b 

Score  2D6'C 

Score_ 

_2D6' 

Score  1  A2'7 

reactivity  +  docking' 

MetaSite" 

random 

number  of  compounds 

60 

60 

32  w/o  rCO 

28  w/rCO 

60 

60 

60 

60 

1st  ranked  site  is  a  major  SOM 

55 

57 

72 

39 

67 

67 

68 

15 

1st  or  2nd  ranked  site  is  a  major  SOM 

70 

73 

84 

61 

80 

72 

78 

29 

1st,  2nd,  or  3rd  ranked  is  a  major  SOM 

78 

82 

91 

71 

83 

87 

85 

42 

1st  ranked  is  a  major  or  minor  SOM 

67 

72 

81 

61 

78 

77 

77 

19 

“Numerical  values  in  the  table  are  percentages  of  the  substrates  for  which  the  SOM  were  correctly  ranked.  bScore_3A4  is  defined  by  eq  1 
cScore_2D6'  is  defined  by  eq  3.  dScore_  I A2  is  defined  by  eq  5.  “Reference  32. 
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Figure  2.  SOM  of  CYP 1  A2-catalyzed  reactions.  Red  arrows:  observed  major  sites.  Blue  arrows:  observed  minor  sites.  Red  numbers:  top-ranked  sites 
by  Score_lA2.  For  molecules  with  symmetrically  equivalent  sites,  only  one  of  the  equivalent  sites  is  labeled. 

However,  even  with  Score_2D6',  the  performance  of  SOM  of  MetaSite  or  the  reactivity  +  docking  approach.  Detailed 

prediction  for  CYP1A2  substrates  was  generally  inferior  to  that  examination  of  the  structures  of  the  60  substrates  revealed  that 
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Figure  3.  (A)  X-ray  structure  of  human  CYP2C9  cocrystallized  with  flurbiprofen  showing  hydrogen  bonding  interactions  between  the  anionic 
carboxyl  group  with  the  Asn204  and  Argl08  side  chains  of  the  protein.  For  clarity,  other  parts  of  the  protein  are  not  displayed.  (B)  Molecular 
structure  of  n-undecane  optimized  by  the  MMFF  force  field. 


28  of  them  have  a  carbonyl  group  with  the  carbon  atom  being 
part  of  a  ring.  For  the  32  molecules  without  a  cyclic  carbonyl 
moiety,  SOM  prediction  given  by  Score_2D6'  significantly 
outperformed  that  of  MetaSite  and  the  reactivity  +  docking 
approach,  as  shown  in  Table  1.  However,  for  the  28 
compounds  with  a  cyclic  carbonyl  moiety,  SOM  prediction 
given  by  Score_2D6'  alone  was  significantly  worse.  This 
indicates  that  the  cyclic  carbonyl  group  may  be  an  important 
contributor  to  the  SOM  of  CYPlA2-catalyzed  reactions.  It 
could  be  a  key  substrate  recognition  feature  of  the  CYPIA2 
enzyme.  The  substrate-enzyme  recognition  feature  may  orient 
the  substrate  so  that  some  reactive  sites  of  the  substrate  are  out 
of  reach  by  the  catalytic  heme  moiety.  Figure  1  shows  a  likely 
interaction  between  the  carbonyl  group  of  a  substrate  and  the 
enzyme  -  the  hydrogen  bonding  interaction  between  the 
carbonyl  oxygen  and  a  water  molecule.  The  water  molecule,  in 
turn,  forms  hydrogen  bonding  interactions  with  the  carbonyl 
oxygen  of  Gly316  of  the  protein,  as  reported  by  Sansen  et  al.  9 
This  is  an  example  of  a  water  molecule  serving  as  a  bridge  for 
receptor— ligand  recognition.  The  fact  that  a  ring  carbonyl  is 
required  for  this  receptor— ligand  interaction  can  be  explained 
by  the  narrow  and  flat  binding  cavity  of  the  enzyme.  A 
noncyclic  carbonyl  group  may  not  be  as  effectively  anchored  to 
the  binding  site  because  of  steric  hindrance  from  the  left  and 
right  groups  attached  to  the  carbonyl  carbon  atom.  The  ring 
ties  the  two  groups  back,  making  the  carbonyl  oxygen 
effectively  exposed  for  hydrogen  bonding  interactions  with 
the  water  molecule.  The  ring  moiety  fits  the  flat  binding  cavity 
well  and  forms  favorable  van  der  Waals  interactions  with  the 
hydrophobic  side  chains  of  the  protein. 

Figure  1  shows  that,  because  of  the  ring  carbonyl-water 
hydrogen  bonding  interactions,  carbon  atoms  at  the  3',  4',  and 
5'  positions  of  the  phenyl  ring  are  about  5  A  from  the  heme 
iron,  a  distance  generally  considered  to  be  within  the  reach  of 
the  Fe03+  catalytic  moiety  of  the  heme.33  On  the  other  hand, 
atoms  within  four  chemical  bonds  of  the  carbonyl  carbon  atom 
appear  too  far  from  the  catalytic  moiety.  However,  this  is  only  a 
static  picture  of  a  protein-bound  ligand.  In  reality,  both  the 
protein  and  the  ligand  have  some  degrees  of  freedom.  In  light 
of  the  SMART Cyp  approach  for  predicting  SOM  of  CYP2D6- 
mediated  metabolic  reactions,  we  introduced  a  correction  term 
to  account  for  the  effect  of  substrate-CYPlA2  recognition.  This 
term  is  called  Dist2rCO  correction  and  is  defined  as 


Dist2rCO -Correction  —  c  X  (Dist2rCO  _cutojf  —  Dist2rCO ) 

(4) 

In  eq  4,  Dist2rCO  denotes  the  distance  in  number  of 
chemical  bonds  between  a  substrate  atom  of  interest  and  the 
most  distant  cyclic  carbonyl  carbon  of  the  substrate.  The 
Dist2rCO_cutoff  is  a  threshold  bond  distance  above  which  no 
correction  is  needed.  Equation  4  is  similar  to  the 
N^Dist —Correction  term  in  eq  2  for  CYP2D6-catalyzed  SOM. 
Rydberg  and  Olsen  selected  the  constant  c  to  be  6.7  kj/mol  and 
lsEDist_cutoff  to  be  8  for  CYP2D6  by  using  a  training  set  of  86 
CYP2D6  substrates.  In  the  case  of  CYP1A2,  we  do  not  have  a 
large  number  of  substrates  as  a  training  set  for  determining 
these  constants.  However,  we  found  that  the  results  were  not 
very  sensitive  to  the  value  of  the  constant  c,  as  long  as  it  was 
approximately  10  kj/mol.  Therefore,  for  simplicity,  we  assigned 
c  =  10  kj/mol  and  experimented  with  bond  distance  cutoff 
values  of  Dist2rCO_cutoff.  For  the  28  substrates  with  cyclic 
carbonyl  groups,  the  best  performance  was  achieved  with 
Dist2rCO _cutoff  =  6.  In  the  end,  our  score  function  for  ranking 
likely  SOM  for  CYPlA2-catalyzed  metabolic  reactions  was 

Score_lA2  =  E  +  Span2End  -Correction  +  Dist2rCO  jcorrec 
tion  (5) 

where  Dist2rCO —Correction  =  10.0  x  (6  —  Dist2rCO)  when 
Dist2rCO  <  6,  and  Dist2rCO— correction  =  0  when  Dist2rCO  > 
6. 

With  the  Dist2rCO —Correction  term  to  account  for  the  effect 
of  CYP1A2  -  substrate  recognition,  the  SOM  prediction  based 
on  Score_lA2  improved  significantly.  Performance  of  SOM 
prediction  for  the  60  substrates  is  shown  in  Table  1  under  the 
column  Score_lA2.  The  molecular  structures  of  the  60 
substrates,  observed  SOM,  and  predicted  top  three  SOM  by 
Score_lA2  are  given  in  Figure  2. 

Based  on  all  four  criteria  in  Table  1,  the  2D  ligand-based 
Score_lA2  is  at  least  as  good  as  the  3D  structure-based 
MetaSite  and  the  reactivity  +  docking  approach,  indicating  that 
the  water  mediated  substrate  recognition  feature  is  an 
important  determinant  for  CYPlA2-catalyzed  metabolic 
reactions. 

2.3.  SOM  Prediction  for  CYP2C9-Catalyzed  Reactions. 

According  to  Rowland-Yeo  et  al.,  nearly  20%  of  the  CYP 
enzymes  in  the  human  liver  are  CYP2C9.26  CYP2C9 
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metabolizes  about  15%  of  marketed  drugs27  and  is  known  to 
exhibit  selectivity  for  the  oxidation  of  relatively  small,  lipophilic 
anions.34  Figure  3  shows  the  X-ray  crystal  structure  of  human 
CYP2C9  cocrystallized  with  flurbiprofen.  The  structure 
indicates  that  the  carboxyl  group  of  flurbiprofen  forms 
hydrogen  bonding  interactions  with  the  Argl08  and  Asn204 
side  chains  of  the  protein.35  Since  the  Argl08  and  Asn204  side 
chains  are  at  the  opposite  end  to  the  heme  in  the  substrate 
binding  cavity,  substrate  atoms  in  close  proximity  to  the 
carboxyl  group  that  forms  hydrogen  bonding  interactions  with 
the  side  chains  are  less  likely  to  be  the  SOM  because  of  their 
distance  from  the  catalytic  moiety.  The  high  resolution  crystal 
structure  showed  that  the  carboxyl  carbon  atom  of  flurbiprofen 
is  13.2  A  from  the  heme  iron.  The  distance  is  close  to  10  C— C 
single  bonds  (~12.6  A),  as  shown  in  Figure  3.  It  is  generally 
considered  that  substrate  atoms  within  ~5  A  of  the  heme  iron 
are  accessible  by  the  FeOs+  catalytic  moiety  and  may  become 
the  site  of  CYP-mediated  metabolic  reactions.33  Based  on  the 
MMFF  force  field  calculations,  5  A  is  approximately  the 
distance  between  the  terminal  carbon  atoms  in  n-pentane,  or 
four  C— C  bonds,  as  shown  in  Figure  3(B).  Taken  together, 
Figure  3  indicates  that  substrate  atoms  within  five  chemical 
bonds  of  the  carboxyl  group  have  reduced  probability  to  be 
sites  of  CYP2C9  catalyzed  metabolic  reactions.  Furthermore, 
the  closer  an  atom  is  to  the  carboxyl  group,  the  less  likely  it  is  to 
be  metabolized  by  CYP2C9.  On  the  basis  of  this  key  substrate- 
receptor  recognition  feature  and  consistent  with  the  SMART - 
Cyp  approach,  a  reasonable  score  function  for  ranking  likely 
sites  of  CYP2C9  metabolism  is 

Score_2C9  =  E  +  Span2End  correction  +  Dist2CO_correct 

ion  (6) 

where  Dist2CO_correction  =10  x  (6  —  Dist2CO)  when  Dist2CO 
<  5,  and  Dist2CO_correction  =  0  when  Dist2CO  >  6. 

In  eq  6,  Dist2CO  is  the  distance  in  number  of  chemical  bonds 
between  a  substrate  atom  of  interest  and  the  most  distant 
carboxyl  carbon  of  the  substrate.  The  Dist2CO_correction  term 
accounts  for  the  effect  of  substrate  -  receptor  recognition  on 
CYP2C9  SOM. 

To  test  this  scoring  function,  we  applied  it  to  predict  the 
SOM  of  21  carboxylic  acids  collated  by  Sykes  et  al.36  that  are 
metabolized  by  CYP2C9.  Table  2  gives  the  performance  of  the 


Table  2.  Performance  Comparison  for  CYP2C9-Catalyzed 
Site  of  Metabolism  (SOM)  Predictions  on  70  CYP2C9 
Substrates" 


Score_2C9fo  StarDrop  5.0C 


carboxylic 

acids 

non¬ 

acid 

overall 

carboxylic 

acids 

non¬ 

acid 

overall 

1st  ranked  site  is 
the  observed 
SOM 

67 

65 

66 

57 

63 

61 

1st  or  2nd  ranked 
site  is  the 
observed  SOM 

81 

88 

86 

71 

73 

73 

1st,  2nd,  or  3rd 
ranked  site  is 
the  observed 
SOM 

86 

88 

87 

71 

79 

77 

“Numerical  values  in  the  table  are  percentages  of  the  substrates  for 
which  the  observed  SOM  were  correctly  ranked.  bScore_2C9  is 
defined  eq  6.  Prediction  given  by  StarDrop  version  5.0  on  69  of  the 
70  substrates.  StarDrop  calculation  on  one  of  the  70  compounds  failed. 


score  function  as  measured  by  the  percentage  of  cases  of  the 
experimentally  observed  SOM  were  among  the  top-ranked  1,  2, 
and  3  sites.  For  comparison,  the  performance  of  SOM 
prediction  by  StarDrop  version  5.0  is  also  included  in  Table 
2.  By  all  three  performance  measures  given  in  Table  2,  the  score 
function  of  eq  6  outperformed  StarDrop  5.0  for  the  21 
carboxylic  acid  substrates. 

To  evaluate  if  it  is  necessary  to  apply  the  progressive  energy 
penalty  for  atoms  in  close  proximity  to  the  carboxyl  group,  we 
also  made  SOM  predictions  using  Score_2D6'  only.  The 
corresponding  values  of  the  correctly  predicted  SOM  for  the  21 
carboxylic  acids  are  52%,  67%,  and  67%,  respectively.  The 
significantly  inferior  results  obtained  by  using  Score_2D6' 
underscores  the  importance  of  the  key  substrate  -  receptor 
recognition  in  determining  the  regioselectivity  of  CYP2C9- 
catalyzed  metabolic  reactions.  These  results  also  validate  the 
use  of  Dist2CO_correction  to  account  for  effects  of  this  substrate 
-  receptor  recognition. 

We  also  applied  Score_2D6'  to  rank  likely  sites  of  CYP2C9 
metabolism  for  the  49  noncarboxylic  substrates  collated  by 
Sykes  et  al.  For  these  compounds,  the  percentages  of  the 
observed  SOM  being  among  the  top-ranked  1,  2,  and  3  sites 
were  57%,  76%,  and  80%,  respectively.  The  results  are  close  to 
the  performance  of  StarDrop  5.0  for  these  compounds  but 
significantly  worse  than  the  performance  of  Score_2C9  for  the 
carboxylic  acids.  Inspection  of  the  molecular  structures  of  the 
compounds  revealed  that,  even  though  they  are  noncarboxylic 
acids,  many  of  them  have  carbonyl  groups.  Carbonyl  groups  are 
perhaps  weaker  hydrogen  bond  acceptors  than  the  negatively 
charged  carboxyl  groups,  but  nonetheless,  they  are  hydrogen 
bond  acceptors.  There  is  no  reason  to  believe  that  they  do  not 
form  hydrogen  bonding  interactions  with  the  Argl08  or 
Asn204  side  chains  of  the  CYP2C9  enzyme.  Failing  to  account 
for  this  hydrogen  bonding  interaction  might  be  responsible  for 
the  inferior  performance  of  the  score  function  for  noncarboxylic 
substrates.  To  test  this  hypothesis,  we  redefined  Dist2CO  in  eq 
6.  For  a  carboxylic  acid  or  anion,  it  does  not  matter  if  there  are 
other  carbonyl  groups  or  not,  Dist2CO  is  always  defined  as  the 
distance  in  number  of  chemical  bonds  between  a  substrate 
atom  of  interest  and  the  most  distant  carboxylic  carbon  atom  of 
the  substrate.  For  other  substrates,  Dist2CO  is  defined  as  the 
distance  in  number  of  chemical  bonds  between  a  substrate 
atom  of  interest  and  the  most  distant  carbonyl  carbon  of  the 
substrate.  Note  that  for  a  compound  with  both  carboxyl  and 
carbonyl  groups,  the  carboxyl  group  is  given  precedence  in 
calculating  Dist2CO.  This  is  based  on  the  observation  that  the 
carboxyl  group  in  Figure  3  forms  hydrogen  bonding 
interactions  with  both  the  Argl08  and  Asn204  residues  of  the 
protein.  A  noncarboxyl  carbonyl  group  most  likely  forms 
weaker  hydrogen  bonding  interactions  with  only  one  of  the  two 
protein  residues.  This  is  consistent  with  the  observation  that 
CYP2C9  exhibits  selectivity  for  the  oxidation  of  carboxylic  acid 
substrates.  With  this  modification,  Score_2C9  was  applied  to 
rank  likely  SOM  of  the  49  noncarboxylic  acid  substrates.  Table 
2  shows  that  the  results  were  significantly  improved.  Overall, 
for  the  70  CYP2C9  substrates,  the  observed  SOM  were  found 
in  the  top-ranked  1,  2,  and  3  sites  in  66%,  86%,  and  87%  of  the 
cases,  respectively.  For  comparison,  StarDrop  5.0  was  also  used 
to  predict  SOM  for  the  70  compounds.  Flowever,  our  StarDrop 
calculations  were  successful  for  only  69  of  the  70  compounds. 
The  calculation  for  zafirlukast  failed,  presumably  due  to  some 
problem  in  the  semiempirical  AMI  stage.  The  corresponding 
percentages  for  the  69  compounds  achieved  by  StarDrop  were 
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Figure  4.  SOM  of  CYP2C9-catalyzed  reactions.  Red  arrows:  observed  sites  of  metabolism.  Red  numbers:  top-ranked  sites  by  Score_2C9.  For 
molecules  with  symmetrically  equivalent  sites,  only  one  of  the  equivalent  sites  is  labeled. 


61%,  73%,  and  77%,  respectively.  Figure  4  shows  molecular 
structures  of  the  70  CYP2C9  substrates,  the  experimentally 
observed  SOM,  and  the  three  top-ranked  sites  by  Score_2C9. 
The  fact  that  the  redefined  Dist2CO_correction  term  improved 
SOM  prediction  supports  the  hypothesis  that  carbonyl  groups 
also  form  hydrogen  bonding  interactions  with  CYP2C9,  which 
influences  the  SOM  of  CYP2C9-catalyzed  metabolic  reactions. 

2.4.  SOM  Prediction  for  CYP2C19-Catalyzed  Metabol¬ 
ic  Reactions.  CYP2C9  and  CYP2C19  have  a  very  high  level  of 
sequence  identity.  The  two  proteins  have  only  43  differing 
amino  acid  residues  among  a  total  sequence  length  of  490 
residues.37  However,  they  have  quite  different  substrate 
specificity.  For  instance,  while  CYP2C9  shows  selectivity  for 
and  is  a  major  metabolizer  of  acidic  substrates  that  are  anionic 
under  physiological  pH,  most  signature  CYP2C19  substrates 
are  lipophilic  and  neutral  at  physiological  pH.38  In  addition, 
CYP2C19  is  highly  selective  for  4'-hydroxlation  of  mephenytoin 
and  5-hydroxylation  of  proton  pump  inhibitors  such  as 
omeprazole  and  lansoprazole,  while  CYP2C9  has  little  activity 
for  these  compounds.37  Because  of  this,  mephenytoin, 
omeprazole,  and  lansoprazole  were  termed  marker  substrates 
of  CYP2C19  by  Wada  et  al.39 


Even  though  a  crystal  structure  of  human  CYP2C19  is  not 
yet  available,  mutation  studies  have  identified  some  key  amino 
acids  that  are  crucial  for  conferring  CYP2C9  and  CYP2C19 
substrate  specificity.40  For  example,  molecular  structural 
analysis  indicated  that  the  F-G  loop  forms  a  flexible  lid  and  a 
substrate  entrance  channel  in  CYP2C8,  CYP2C9,  and 
CYP2B4.41-43  Substitution  of  the  F-G  loop  in  CYP2C9  to 
that  of  CYP2C19,  i.e.,  Ser220  — »  Pro  and  Pro221  — »  Thr,  does 
not  alter  the  enzyme  activity  toward  CYP2C9  marker  substrates 
but  enhanced  4'-hydroxylation  of  mephenytoin  and  5- 
hydroxylation  of  omeprazole,  which  were  not  detectable  in 
CYP2C9.39  In  addition,  mutation  of  Ile99  to  His99  (of 
CYP2C19)  in  CYP2C9  increased  omeprazole  5-hydroxylase 
to  ~51%  of  that  of  CYP2C19.40  This  provides  strong  evidence 
that  amino  acids  99,  220,  and  221  are  among  the  key  residues 
that  determine  the  distinctive  substrate  specificities  of  CYP2C9 
and  CYP2C19. 

The  crystal  structure  of  CYP2C9  indicates  that  Ile99  is  very 
close  to  the  heme,35  and  it  is  expected  that  this  is  also  the  case 
for  His99  in  CYP2C19.  Locuson  et  al.  postulated  that  His99 
plays  an  important  role  in  CYP2C19  metabolism  by  serving  as  a 
hydrogen  bond  donor  and  forming  hydrogen  bonding 
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Figure  5.  Hydrogen  bonding  interactions,  postulated  by  Locuson  et  al.,  between  His99  of  CYP2C19  and  a  substrate,  and  the  role  of  the  hydrogen 
bonding  interactions  in  the  regioselectivity  of  the  metabolic  reactions.  The  sketch  was  adapted  with  permission  from  the  paper  of  Locuson  et  al. 
published  in  /.  Med.  Chem.  2004,  47,  6768—6776  Copyright  2004,  American  Chemical  Society. 


interactions  with  the  carbonyl  or  sulfinyl  oxygen  of  the 
substrate,  as  illustrated  in  Figure  5.44  The  hydrogen  bonding 
interactions  enhance  the  affinity  of  the  substrates  to  the  enzyme 
and  orient  the  substrates  to  the  nearby  heme  catalytic  site. 

To  examine  if  the  SMART Cyp  approach  could  be  extended 
to  predict  SOM  of  CYP2C19  mediated  metabolic  reactions,  we 
collated  36  compounds  metabolized  by  CYP2C19.  The 
observed  CYP2C19  SOM  of  these  compounds  were  obtained 
from  the  papers  of  Lewis  et  al.38  and  Locuson  et  al.44  and  from 
DrugBank.4' 

As  a  first  step,  we  applied  Score_2D6'  to  evaluate  if 
SMART Cyp  reactivity  alone  was  sufficient  for  predicting 
SOM  of  the  36  CYP2C19  substrates.  As  shown  in  Figure  6 
and  Table  3,  reasonably  reliable  predictions  were  achieved  by 
Score_2D6'  without  using  any  CYP2C  19-specific  information. 
The  observed  sites  of  CYP2C19  metabolism  were  found  to  be 
among  the  top-ranked  1,  2,  and  3  sites  in  69%,  89%,  and  92%  of 
the  cases,  respectively.  However,  for  the  CYP2C19  marker 
substrates  mephenytoin  and  lansoprazole,  the  observed  sites  of 
metabolism  were  second-ranked  sites  by  Score_2D6',  as  shown 
in  Figure  6.  To  improve  prediction  consistent  with  the 
SMART Cyp  approach,  we  adopted  the  following  score  function 

Score_2C19  —  E  +  Span2End  correction  +  Dist2XO  _corre 

ction  (7) 

Dist2XO _correction  =  10  x  (Cutoff  —  Dist2XO),  when  Dist2XO 
<  Cutoff,  and  Dist2XO_correction  =  0  otherwise. 

In  the  preceding  equations,  XO  represents  C=0  or  S=0 
moieties,  as  they  were  postulated  by  Locuson  et  al.44  to  be 
CYP2C19  substrate  recognition  features.  Dist2XO  denotes  the 
distance  in  number  of  chemical  bonds  between  a  substrate 
atom  of  interest  to  the  most  distant  C=0  or  S=0  group  of 
the  substrate. 


Test  calculations  indicated  that  a  Cutoff  value  of  4  is 
reasonable.  This  threshold  value  implies  that  substrate  atoms 
within  three  chemical  bonds  of  C=0  or  S=0  are  likely  to  be 
positioned  too  far  from  the  catalytic  heme  moiety  and, 
therefore,  have  lower  probabilities  to  be  sites  of  CYP2C19- 
catalyzed  reactions.  Figure  6  shows  the  top  three  sites  for  each 
substrate  ranked  by  Score_2D6'  and  Score_2C19.  Overall,  an 
observed  SOM  was  found  to  be  among  the  top  1,  2,  and  3  sites 
ranked  by  Score_2C19  78%,  89%,  and  94%  of  the  time, 
respectively,  for  the  36  substrates.  Two  factors  contributed  to 
the  higher  percentage  of  correct  SOM  predictions  for 
CYP2C19  substrates  than  for  CYP1A2  and  CYP2C9  substrates. 
First,  the  number  of  CYP2C19  substrates  we  collected  and  used 
in  the  study  was  relatively  small.  Second,  and  more  importantly, 
the  observed  metabolic  reactions  for  approximately  half  of  the 
CYP2C19  substrates  were  either  N-  or  O-dealkylation 
reactions.  These  reactions  have  significantly  lower  activation 
barriers  than  most  other  reactions.  As  a  result,  the  SOM  of  bl¬ 
and  O-dealkylation  reactions  were  correctly  predicted  by 
SMARTCyp  energies  for  most  of  the  compounds. 

3.  CONCLUSIONS 

The  results  of  this  study,  together  with  those  of  Rydberg  et  al. 
on  CYP3A4  and  CYP2D6  SOM  predictions,  demonstrated  that 
chemical  reactivity  is  the  most  important  determinant  of 
substrate  SOM  of  CYP-catalyzed  reactions.They  also  demon¬ 
strated  that  the  activation  energies  of  Rydberg  et  al.,  derived 
from  density  functional  theory  calculations  and  corrected  by 
the  Span2End  descriptor,  are  reasonable  representations  of 
substrate  site  reactivity  for  most  CYP  enzymes.  Highly  reliable 
SOM  predictions  for  CYP1A2,  CYP2C9,  and  CYP2C19  were 
derived  based  on  the  reactivity  combined  with  a  correction 
term  to  account  for  the  effect  of  key  substrate  -  enzyme 
recognition  on  substrate  access  to  the  catalytic  moiety.  The  fact 
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Figure  6.  SOM  of  CYP2C19-catalyzed  reactions.  Red  arrows:  observed  sites  of  metabolic  reactions.  Blue  numbers:  top-ranked  sites  by  Score_2D6' 
(no  CYP2C19  specific  information  is  used).  Red  numbers:  top-ranked  sites  by  Score_2C19.  For  molecules  with  symmetrically  equivalent  sites,  only 
one  of  the  equivalent  sites  is  labeled. 


Table  3.  Performance  Comparison  for  CYP2C19-Catalyzed 
Site  of  Metabolism  (SOM)  Predictions  on  36  CYP2C19 
Substrates” 

Score_2D6'b  Score_2C19c 

1st  ranked  site  is  the  observed  SOM  69  78 

1st  or  2nd  ranked  site  is  the  observed  SOM  89  89 

1st,  2nd,  or  3rd  ranked  site  is  the  observed  92  94 

SOM 

“Numerical  values  in  the  table  are  percentages  of  the  substrates  for 
which  the  observed  SOM  were  correctly  ranked.  ^Score_2D6'  is 
defined  by  eq  3.  ‘  Score_2C  1 9  is  defined  by  eq  7. 


that  the  correction  term  defined  by  the  distance  to  a  cyclic 
carbonyl  group  significantly  improved  CYP1A2  SOM  pre¬ 
dictions  supports  the  finding  that  a  structured  water  molecule 
in  the  CYP1A2  substrate  binding  cavity  plays  an  important  role 
in  CYP1A2  substrate  recognition.  This  water  molecule  was 
observed  to  be  hydrogen-bonded  to  the  carbonyl  of  a  CYP1A2 
inhibitor  and  to  the  carbonyl  of  Gly316  of  the  enzyme. 
Similarly,  the  fact  that  a  correction  term,  defined  by  the 
distance  to  a  substrate  carbonyl  or  sulfinyl  group,  improved 
CYP2C19  SOM  prediction  supports  the  hypothesis  of  Locuson 
et  al.  on  the  role  of  residue  His99  in  CYP2C19  metabolism. 

In  principle,  docking  of  substrates  into  the  protein  structures 
should  give  reliable  prediction  of  substrate  sites  that  may  be 
accessible  to  the  catalytic  moiety.  However,  docking 
simulations  are  much  more  computationally  demanding  than 
the  2D  substrate  structure  based  SMARTCyp  approach  for 
SOM  prediction.  In  addition,  it  is  challenging  to  properly 
account  for  protein  flexibility  in  a  docking  simulation,  and  it 
requires  expert  knowledge  to  select  an  appropriate  docking 
score  function  to  rank  the  docked  poses.  As  a  result,  docking  is 
more  of  an  expert  tool  that  requires  significant  knowledge  and 
experience  to  process  computational  results  and  select  relevant 
poses  from  a  large  number  of  docked  conformations.  On  the 
other  hand,  the  SMARTCyp  approach  for  predicting  likely  sites 
of  CYP-catalyzed  metabolic  reactions  is  very  fast  as  it  uses  2D 
molecular  structure  information  only  and  is  easily  applied  by 


anyone  without  the  need  for  specialized  training.  The  score 
functions  for  ranking  likely  sites  of  metabolism  by  CYP1A2, 
2C9,  and  2C19  enzymes  as  described  in  this  article  were 
implemented  as  an  extension  of  the  SMARTCyp  2.0  program 
by  modifying  the  program  modules.  With  the  extension  to 
cover  the  three  additional  CYP  isoforms,  the  program  can  now 
successfully  predict  SOM  for  all  five  major  drug  metabolizing 
CYP  enzymes.  Both  the  Java  source  code  and  the  binary 
executable  of  the  program  are  freely  available  for  download  at 
http :  /  /  www.bhsai.org/ downloads/ smartcyp_ext/. 
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